-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support ddp_fork strategy with native AMP by attempting NVML-based CUDA availability assessment #14984
Merged
awaelchli
merged 15 commits into
Lightning-AI:master
from
speediedan:enable_ddp_fork_strategy_with_native_amp
Oct 5, 2022
Merged
Support ddp_fork strategy with native AMP by attempting NVML-based CUDA availability assessment #14984
awaelchli
merged 15 commits into
Lightning-AI:master
from
speediedan:enable_ddp_fork_strategy_with_native_amp
Oct 5, 2022
+65
−6
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
speediedan
requested review from
awaelchli,
carmocca,
justusschock,
rohitgr7 and
otaj
as code owners
October 4, 2022 00:33
speediedan
force-pushed
the
enable_ddp_fork_strategy_with_native_amp
branch
2 times, most recently
from
October 4, 2022 03:14
51f7669
to
09a42e5
Compare
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #14984 +/- ##
=========================================
+ Coverage 83% 84% +1%
=========================================
Files 396 276 -120
Lines 29089 21245 -7844
=========================================
- Hits 24074 17842 -6232
+ Misses 5015 3403 -1612 |
carmocca
reviewed
Oct 4, 2022
awaelchli
approved these changes
Oct 4, 2022
awaelchli
added
accelerator: cuda
Compute Unified Device Architecture GPU
precision: amp
Automatic Mixed Precision
labels
Oct 4, 2022
carmocca
reviewed
Oct 4, 2022
…DA availability assessment
…e` with its NVML-based version if possible
…pe of new ddp_fork test
Co-authored-by: Carlos Mocholí <[email protected]>
Co-authored-by: Carlos Mocholí <[email protected]>
… new `_patch_cuda_is_available` context manager
speediedan
force-pushed
the
enable_ddp_fork_strategy_with_native_amp
branch
from
October 5, 2022 01:05
f818027
to
8441726
Compare
speediedan
force-pushed
the
enable_ddp_fork_strategy_with_native_amp
branch
from
October 5, 2022 01:15
867f7c7
to
6cc8e44
Compare
for more information, see https://pre-commit.ci
carmocca
approved these changes
Oct 5, 2022
mergify
bot
added
ready
PRs ready to be merged
has conflicts
and removed
ready
PRs ready to be merged
labels
Oct 5, 2022
mergify
bot
added
ready
PRs ready to be merged
and removed
has conflicts
ready
PRs ready to be merged
labels
Oct 5, 2022
justusschock
approved these changes
Oct 5, 2022
…, we only need 1 GPU
…he last few hours
nicolai86
pushed a commit
that referenced
this pull request
Oct 13, 2022
…DA availability assessment (#14984) Co-authored-by: Carlos Mocholí <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Jirka Borovec <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
accelerator: cuda
Compute Unified Device Architecture GPU
community
This PR is from the community
pl
Generic label for PyTorch Lightning package
precision: amp
Automatic Mixed Precision
ready
PRs ready to be merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Fixes #14981
ddp_fork
(and associated alias strategies) cannot currently be used along with native AMP due to the invocation of the CUDA Runtime API within the call toGradScaler
in theNativeMixedPrecisionPlugin
:https://github.com/Lightning-AI/lightning/blob/c059db446e7bfea03fba91e598ad503f0d1c6581/src/pytorch_lightning/plugins/precision/native_amp.py#L53
which in turn initializes CUDA and poisons subsequent forks.
It may be possible with a future version of PyTorch to alter the default behavior of
torch.cuda.is_available()
to use an NVML-based CUDA assessment throughout Lightning. In the meantime, patchingtorch.cuda.is_available()
with Lightning's implementation of the upstream NVML-based assessment can unlock this functionality.This PR patches
torch.cuda.is_available()
withinNativeMixedPrecisionPlugin
(both Lite and PL versions) and adds a standalone test for theddp_fork
strategy in a CUDA and native AMP context (adding a standalone test only for PL given how expensive the standalone multi-gpu tests can be).The use of
ddp_fork
(and its associated alias strategies) with native AMP in context (e.g. in the context of Jupyter notebooks) should be possible after this PR. Note that this PR does not address or pertain to bf16 based AMP support.Additional context
There's a related PR in PyTorch currently that may allow the requested modification of
torch.cuda.is_available()
throughout Lightning without needing to patch the function or add Lightning's own NVML-based assessment (once the relevant version of PyTorch is the minimum)Does your PR introduce any breaking changes? If yes, please list them.
None
Before submitting
PR review
Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:
Did you have fun?
Make sure you had fun coding 🙃